home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-04-12 | 62.7 KB | 1,106 lines |
-
-
- CHAPTER IV
-
- RESULTS AND DISCUSSION OF FINDINGS
-
-
- Purpose
-
- This case study addressed the information retrieval
- design potential of hypertext systems. It studied the
- implementation of desirable information retrieval features
- within a working hypertext authoring system.
-
- The study focused on hypertext implementation of
- traditional print-oriented information retrieval methods.
- The investigator wanted to determine the extent to which
- these traditional methods could be achieved, emulated, or
- otherwise incorporated within a commercial, end-user,
- implementation of a hypertext system.
-
-
- Results of the Case Study
-
- The investigator developed the previously mentioned
- Information Access Model (IAM) <app-a>, as a conceptual
- representation of the approaches and features used in
- traditional information retrieval systems. He refined the
- IAM outline to produce an interview schedule <app-b>.
-
- The investigator interviewed Neil Larson and Tony
- Phillips, the two principals involved in the production and
- application of the subject hypertext system, during March
- 11-15, 1991. The interviews were at the separate hypertext
- developer and publisher sites in Berkeley and Kentfield,
- California.
-
- As described in Chapter III, the investigator
- provided the subjects with copies of the schedule prior to
- the interview session. He also reviewed the contents with
- them prior to the interview, to insure understanding of the
- terminology and question intent. The sessions were
- recorded in note form and also tape-recorded. The
- investigator later reviewed the notes and tape recordings
- in detail to produce detailed interview summaries (See
- Appendixes D <app-d> and E <app-e>). The summaries were
- then used to prepare summary tables and graphs presenting
- results of the IAM schedule responses.
-
- As the actual software developer, Larson was more
- familiar with the technical abilities of the system.
- Phillips is the author/producer of the large hypertext
- system described previously. His focus on the pragmatic
- editorial approach occasionally led him to categorize
- various schedule items as being achievable or as a matter
- of editorial option, when the functionality was actually
- already present. The investigator was able to determine
- this, since he had extensive conversation with both
- individuals.
-
- Accordingly, the table summaries of system abilities
- are based more on Larson's interview schedule. Yet, the
- interview summaries record both individual's responses to
- the items. However, the summaries provided only a base for
- the final tabular ratings. The investigator first compared
- the responses of the two interviewees in detail. He then
- balanced the interview findings with a number of supporting
- external information sources. These included: detailed
- examination of system documentation and the MaxThink
- newsletter; direct observation of the DaTa CD-ROM
- production operation; informal interviews with several
- MaxThink users; direct hands-on use of the software; and
- follow-up telephone conversations with the principals.
-
- The results of the ratings are presented below. The
- discussion is grouped according to the major divisions of
- the IAM outline: A) Archive & Transaction Support System;
- B) Information Access System; and C) Control Mechanisms.
- Section A of the IAM schedule will be summarized in text
- discussion. Discussion of results in sections B and C of
- the schedule will begin with the respective summary tables
- of findings, then continue with discussion of the
- individual items. Each part will conclude with a short
- summary, using graphs to portray the grouping of results.
-
- Archive and Transaction Support System
-
- The investigator began with a conceptual model of
- traditional information retrieval systems. This model was
- mentioned earlier in this paper, as the Information Access
- Model (IAM). The complete IAM is found in Appendix A
- <app-a>. The investigator reviewed major writers on the
- topic, from those identified in Chapter II. For the
- purposes of constructing the IAM, he relied most on Borko
- and Bernier (1978) <refs -borko>, Cleveland and Cleveland
- (1990) <refs -cleveland>, Foskett (1982) <refs -foskett>,
- Meadow (1973) <refs -meadow>, Milstead (1984) <refs -milstead>,
- Taylor (1986) <refs -taylor>, and Vickery (1973)
- <refs -vickery>.
-
-
- Background of the Model
-
- The IAM <app-a> was a concise definition or portrayal
- of a generalized information retrieval system. It was
- intended as a global listing of information retrieval
- features. The IAM document served partially to communicate
- the information retrieval concepts to the subject
- interviewees and partially to focus the study on assessment
- of relevant information access features.
-
- There were obvious similarities in portrayals of
- information systems by the above-listed writers. The
- systems were generally described as having the purpose of
- allowing users to effectively and efficiently access the
- contents of documents or records covered by the particular
- system. As Foskett writes <refs -foskett>:
-
- . . . the problem that we have to face is that of
- ensuring that individuals who need information can
- obtain it with the minimum of cost (both in time and
- money), and without being overwhelmed by large amounts
- of irrelevant matter. (1982, 1)
-
- This reflects the general pragmatic approach of these
- writers, who appear to view these systems as practical
- tools, rather than abstract or theoretical representations
- of knowledge.
-
- Cleveland and Cleveland <refs -cleveland> summarize
- the functions of an information system as follows: For an
- information system to carry out the information process, at
- least six distinct functions are required:
- (1) the acquisition of the necessary and appropriate
- documents,
- (2) the preparation and representation of the content of
- these documents,
- (3) the coding of the content indicators for ease of
- manipulation,
- (4) the organized storage of those documents and their
- indicators in separate files,
- (5) the development of operational search strategies,
- and
- (6) the physical dissemination of the retrieval results.
- At the center of this system is the procedure that
- identifies and represents the content of the collection
- to the user; in most cases this is an index (1990, 38).
-
- Traditional, paper-based information retrieval
- systems have generally used some form of the index approach
- for access to documents. This same approach has carried
- over to the present generation of automated information
- retrieval systems.
-
- For example, Vickery <<refs -vickery> notes that the
- general point of entry into an information system is a "list
- of words," whether it be an index, table of contents, or
- linked to a classification code (1973, 87). He also describes
- three main approaches to creating the representation of
- document content, in the index file. These are: (1) simple
- extraction of terms from the source document; (2) selective
- extraction of terms, guided by frequency of usage or
- significance in the source document; and (3) assignment of
- pre-existing keys, or a defined indexing vocabulary.
-
- Both Vickery (1973) <refs -vickery> and Meadow (1973)
- <refs -meadow> note the importance of a standardized index
- language, or controlled vocabulary, to ensure consistent usage
- by different indexers, as well as between indexers and index
- users. Problems of consistency arise as index languages become
- larger and more complex. Index language use becomes more
- difficult for indexers, resulting in lowered productivity,
- index quality problems, and higher costs of indexing. The
- larger language can also be much more difficult for the index
- user to comprehend and use (Meadow 1973).
-
- After digesting major writers addressing information
- retrieval system description, the investigator developed
- generalized flowchart representations illustrating working
- information retrieval systems. These flowcharts were used
- in creating the IAM and in describing terminology to the
- interviewees. The general level flowchart is included as
- Figure 1.
-
- >>>> FIGURE 1 GOES HERE
-
- This simplified representation of information system
- processing parallels many of the functions described above
- by Cleveland and Cleveland. The entry block at the top
- represents document selection and acquisition. This is
- followed by the document analysis operation. The next
- operation involves document concept identification and
- representation of the document by keys or descriptors.
- Flowchart blocks on either side of this "key creation"
- process represent application of the indexing or
- classification approaches, and the editorial and quality
- control mechanisms selected for the particular information
- retrieval system implementation. The next operation
- involves final processing and production of the information
- system. The terminal operation represents the completed,
- distributed information system.
-
- Figure 2, depicting the same general process, is
- taken from Vickery (1973, 88) <refs -vickery>. This flow
- diagram emphasizes the duality of the retrieval language used
- by both the indexer and user. Vickery shows that both parties
- must use a common retrieval language in order to produce
- compatible index record and query forms. The system
- performance will reflect the success by both parties in the
- appropriate use of this common language.
-
- >>>> FIGURE 2 GOES HERE
-
- Going one step further, Figure 3 summarizes the
- design components of the standardized indexing or
- classification approach. This is the index language which
- is applied in the general information system processing
- discussed above, and illustrated in Figure 1. This
- flowchart shows that the standardized index or
- classification approach results from a combination of
- decisions. These include design decisions regarding:
- choice of access points; of available information access
- methodologies or operations (e.g. index use, full text
- searching, hierarchical or taxonomic approaches); and
- finally, the design of editorial and quality control
- procedures.
-
- >>>> FIGURE 3 GOES HERE
-
- System Study Results
-
- The investigator extensively analyzed the production
- and implementation of the previously described DaTa
- hypertext system for accounting and auditing domain
- information. This is the main information system
- publication produced using the subject MaxThink authoring
- system. He studied the DaTa product in detail during the
- site visit, interviewed the principals extensively, and
- examined the authoring system software and documentation in
- detail. He also obtained several issues of the DaTa CD-ROM
- hypertext for later examination.
-
- Figure 4 is a flowchart of the general production
- process of the subject information system. The production
- process is described in detail in the interview summaries,
- Appendixes D <app-d> and E <app-e>, particularly in the
- Phillips interview summary.
-
- >>>> FIGURE 4 GOES HERE
-
- The interviewer saw many parallels between the DaTa
- production operation, and the generalized information
- retrieval system workflow model. The DaTa operation begins
- with acquisition of source text (mostly in hardcopy form).
- The first document processing step is thus to use optical
- character recognition (OCR) scanning for conversion of hard
- copy to machine-readable format. These first operations
- are comparable to standard document acquisition. This is
- followed by the editorial operation of splitting text into
- smaller, single topic, text nodes, and reformatting it for
- best display screen presentation. This corresponds to the
- document processing step in Figure 1.
-
- The third step is the major intellectual operation of
- adding hypertext organization and embedding links into the
- document text. This involves several operations: updating
- of the network hierarchies, inserting of the hypertext
- links into document text, and generation of the Keyword Out
- of Context (KWOC) index. This part of the operation
- clearly parallels the document representation and creation
- of descriptors/keys in traditional information retrieval
- system processing.
-
- The final steps of DaTa production involve the update
- processing, manufacturing, and distribution steps of the
- operation. These steps are identical to the functions of
- the last two steps of the generalized information retrieval
- system workflow depicted in Figure 1.
-
- The investigator has worked in and managed major unit
- record files (newspaper clipping files), as well as
- newspaper index and text database operations. It was clear
- to him that the workflow of the DaTa hypertext system was a
- sophisticated information retrieval system production
- operation. Although untrained in traditional information
- retrieval system methods, the system designers nevertheless
- arrived at pragmatic equivalents to many standard
- techniques.
-
- The practicality of the information retrieval design
- orientation is illustrated by Larson's description of his
- system design goals, presented during the interviews. He
- summarized these as:
-
- 1. Emphasis on ease of use . . . the simplest,
- easiest, most intuitive, possible user interface. In
- Larson's words, "So you never have to think about it."
- 2. Designing for the "lowest common denominator"
- hardware platform. The DaTa system runs on any IBM-
- compatible hardware, from the earliest Intel 8088 chip IBM-
- PCs to the current Intel 80486 chip units. It runs on any
- version of MS-DOS starting with Version 2.1. Random Access
- Memory (RAM) requirement is a modest 512K. The system will
- work with either monochrome or color monitors.
- 3. Providing a sophisticated domain area matrix
- (hierarchical network taxonomy), with a great many highly
- redundant and highly cross-referenced information approach
- trails.
- 4. Providing overlapping, complementary,
- information access methods, which is characterized as -
- a. Taxonomic approach, using hierarchical
- networks;
- b. Linguistic approach, using online KWOC index
- and glossary.
- c. Associative network approach, using embedded
- hypertext links.
-
-
- Information Access System
-
- This section <app-b 2 19> of the IAM covered the
- features within the hypertext system which supported
- information access. The evaluation and tabular recording of
- IAM responses involved detailed analysis of the interview
- summaries, balanced against corroborating evidence. This
- evaluation process was explained in detail earlier in this
- chapter, in the "Results of the Case Study" section.
-
- "Section B.1." Access Points
-
- These items <app-b 2 21> addressed information system
- mechanisms for providing access by different "avenues of
- approach." The access points included are representative of
- those used in various traditional information systems. Access
- Point Item Responses
-
- Table 1 lists the results of the responses to the
- Access Points items. Eleven of the fourteen were rated as
- being present in the subject hypertext system. Three of
- the items were rated as easily achievable through editorial
- decision or use of external software.
-
- <TABLE1>
-
- Item B.1.a., the Main File Sequence, referred to the
- suitability of the basic organizational sequence as an
- access point. Traditional information systems
- implementations allowing this include document
- classification systems, alphabetical filing systems, and
- sequentially numbered or coded transaction files. The
- MaxThink system uses ASCII disk files, with alphanumeric
- DOS file names. These document files may optionally be
- further broken down for storage in named hard disk
- subdirectories. Such a storage approach lends itself to
- the formation of topical subdirectories and coded or
- standard file naming approaches. This item was rated
- Present.
-
- Item B.1.b. covered retrieval by author. This was
- rated present, since many MaxThink system retrieval
- features may be used for author access. These features
- include hierarchical taxonomies, online indexes,
- subdirectory organization, and other approaches.
-
- Item B.1.c. covered title access. This is
- occasionally provided in the DaTa CD product. Title
- representation is achievable upon editorial decision. This
- item was rated as present.
-
- Items B.1.d. through B.1.d.ii, including name forms,
- personal and corporate names, may all be optionally
- provided by editorial decision. They were therefore rated
- as present.
-
- Item B.1.e. referred to keyword retrieval. This is
- available using several approaches, primarily the
- "Glossary" (TM) KWOC index. This is an online index
- covering file descriptive header text lines as well as the
- text of the added taxonomical descriptions. The Glossary
- (TM) index excludes stopwords. MaxThink hypertexts also
- provide gateway interface to "SEARCHWORD," a string-
- searching program module, and to "CD-INDEX," a full text
- index access module. Hypertext system links can also
- transparently execute or "call up" other string-searching
- or text database external programs. Keyword access was
- therefore rated as present.
-
- B.1.f. addressed subject, topical, or concept access.
- This is provided by several approaches, including
- hierarchical taxonomy, hypertext network interconnections,
- hypertext associative links, and the KWOC index. This item
- was rated as present.
-
- Items B.1.g. through B.1.j. dealt with geographic,
- date or chronological, [foreign] language, and document
- format access points. These may all be achieved or
- provided as a matter of editorial decision. The system
- indexing mechanisms are provided by the various retrieval
- functions or approaches within the system, as described
- above. The items were rated as present. They may
- optionally be provided by using interface to external
- software.
-
- Item B.1.k., access to document position or location,
- can optionally be provided by editorial decision. The
- principals felt it would be labor-intensive and impractical
- to do this using the hierarchy or hypertext links. They
- recommended this be accomplished by interfacing to an
- external searching program with this ability. The item was
- rated as easily achievable.
-
- Item B.1.l. [the last character is the letter L]
- covered retrieval via automated search of data in specified
- field locations. Boolean searching and field specification
- database features are not present in the subject hypertext
- system. The system can, however, provide such access by
- interface execution of a program with these capabilities.
- For example, this investigator has built hypertext systems
- with the subject software package, using link execution of
- external text database programs. These programs included
- Zyindex (TM), BiB/SEARCH (TM), and Nutshell Plus (TM).
- This feature was therefore rated as easily achievable.
-
-
- Access Points Summary
-
- All traditional access points can be implemented with
- the subject hypertext system, or accomplished by interface
- to external programs. This was established during the
- interviews, and verified by direct examination of the
- subject system authoring software and the DaTa CD-ROM
- hypertext application.
-
- Figures 5 and 6 graphically illustrate the
- proportional placement of the responses. Figure 5 shows
- that 78.6% of the access points were present in the subject
- application. The remaining 21.4% fell into the easily
- achieved category. Figure 6 presents more detailed
- information in bar graph form. Eleven of the access point
- items were present, three were easily achievable, and no
- items were categorized as not possible or practical.
-
- >>>> FIGURES 5 AND 6 GO HERE
-
- "Section B.2." Access Approaches
-
- The items in Section B.2. <app-b 6 6> of the schedule
- addressed the information system access approaches or devices,
- or methods provided by the hypertext system. The I.A.M. model
- features represented by these schedule items were based on
- traditional information retrieval approaches. Access
- Approaches/Systems Responses
-
- Table 2 gives the results of the access approaches
- section responses. Seven of the nineteen items were rated
- as being present in the subject hypertext system. Eleven
- of the items were rated as easily achievable through
- editorial decision or use of external software. Only one
- of the fourteen items was judged as not possible or
- practical.
-
- <TABLE2>
-
- The first part of the access approaches section,
- B.2.a., covered the general classification scheme
- approaches. Item B.2.a.i. dealt with hierarchical taxonomy
- ability. Hierarchical knowledge representations, or
- taxonomies, are regarded as one of the easiest to use, yet
- most effective, approaches to information retrieval. The
- obviousness of well-designed choices which are presented at
- each level of even a complex hierarchical structure means
- that even novices are able to use them effectively (Meadow
- 1973) <refs -meadow>. Glynn and Di Vesta (1977) <refs -glynn>
- have shown that the use of a hierarchical or outline structure
- measurably aided subjects in better comprehension of a
- knowledge domain and its component relationships. They
- reported that the logical and coherent approach of
- hierarchical learning and retrieval aids also helped subjects
- perform better in recalling and inferring specific facts.
-
- The basic access design system for the MaxThink
- system hypertexts grows from the efficient use of taxonomic
- structures and complex networking interlinking. The
- Houdini (1987a) <refs -houdini> "three-dimensional outliner"
- allows easy interconnection of separate hierarchies into
- complex matrix networks. This matrix outliner has the ability
- to quickly link any network node or ASCII filename reference
- to any other point in the network. It can also link between
- separate networks. Therefore, the author is not limited to one
- inflexible hierarchy.
-
- Instead, the author can use rich interconnection
- across multiple hierarchical taxonomies. The author can
- place a single item into many appropriate retrieval
- hierarchy paths, in a manner similar to filing under
- multiple entries in a card catalog. When advisable, the
- author may also interconnect entire hierarchical levels
- within and across networks. For example, there might be an
- interconnection from a subtopic of the "Pet Care"
- hierarchy, across to the "Veterinary Medicine" network, so
- it may also function as a subtopic of an "Immunization
- Research" topic. The MaxThink (1987b) <refs 16 4> basic
- outliner and Houdini matrix outliner programs both support the
- creation of these intertwined networks.
-
- This ability to add complex multiple dimensions to
- hierarchies or outlines retains the basic ease of use of an
- outline structure, yet adds representational and retrieval
- power far above that of simple or "flat" hierarchical
- structures (Danielsen 1989) <refs -danielsen>. Besides
- allowing creation of clear, understandable hierarchical
- knowledge structures, for the user interface, MaxThink's
- outliner and matix outliner tools also add great production
- efficiencies to the authoring process.
-
- The MaxThink tools enable a two-phase approach to
- hypertext linking. They thus eliminate the necessity for
- an author to deal with the enormous number of possible
- links within document texts. The process can now be
- divided into two more manageable operational steps:
-
- 1) "Macro linking" - Document positioning in the
- global domain matrix. This is handled with the
- hierarchical or matrix outliners, perhaps the only
- efficient tools for this work. This task involves the
- major positioning of the document or entity in the correct
- position/document cluster of the hierarchical networks.
- This step is comparable to classifying a book into the
- correct subject location in the Dewey or Library of
- Congress classifications. The Houdini maxtrix outliner
- also allows multiple hierarchical paths or access trails
- leading to the same document or document cluster.
-
- 2) "Micro linking" - Placement of associative links
- within document texts or images. This task consists of
- adding the embedded links or jumps to related and relevant
- items. This may be done using one of the many specialized
- authoring tools. This is now a relatively quick and easy
- task, since the author does not have to deal with the
- global universe of linking possibilities. The hierarchy
- placement has already placed the document in the proper
- position in the domain conceptual matrix. Authors now need
- only deal with making links within a more limited number of
- closely related documents in a topical cluster, or to other
- major network hierarchical nodes.
-
- The end result of this workmanlike approach is
- surprising speed in handling document or hypermedia item
- insertion into the networks. As illustration, Phillips
- processes approximately 1000 screens per week, including
- placement in the domain taxonomy, and embedding of internal
- associative links.
-
- Wayne McPhail, president of Metaphor - The Hypermedia
- Group, in Hamilton, Ontario, a hypertext and hypermedia
- production group, is another hypertext author who appreciates
- the efficiency of the MaxThink approach. He writes
- <refs -mcphail> that his group originally selected the
- MaxThink authoring system, primarily because "MaxThink has
- developed a number of powerful, elegant tools for creating
- intelligent hierarchical structures . . . which allow users to
- easily build hierarchical and knowledge matrix systems which
- can be converted into hyperdocuments . . ." (1991, 461). He
- continues:
-
- In the years following the release of [the first]
- hyperdocument, I explored a number of hypertext systems
- including KnowledgePro, Guide, Black Magic, and Matrix
- Layout. Each had its own appeal, but I found myself
- returning to MaxThink's products because they allowed
- me to develop hypertexts quickly and efficiently.
- (1991, 462)
-
- This writer has had similar experience, in producing
- three hypertext systems with the MaxThink authoring system.
- The text content of these systems ranged from 300,000-
- 750,000 characters. For each project, the editorial tasks
- of designing the document coverage, writing and assembling
- texts, splitting them into logical nodes, and writing of
- bridging material, took two to three weeks. In all three
- cases, the actual building of the hyperdocument network
- took approximately two hours. This included building the
- system network, creating hierarchy menu screens, embedding
- links within the text nodes, and doing final "cosmetic
- work" on system screens.
-
- This is hardly the "labor-intensive" hypertext
- authoring trap of which many writers have complained or
- cautioned. The MaxThink hierarchical manipulation tools
- are both editorially powerful and efficient. Therefore,
- the B.2.a.i. item dealing with hierarchical taxonomy
- ability was rated as present.
-
- Item B.2.a.ii. dealt with enumerative, universal
- classifications. This item referred to complex, complete,
- predefined or fixed, universal classifications, such as the
- Dewey Decimal or Bliss classifications. The subject
- principals felt adoption of such a classification to be an
- editorial option. Hypertext system linkages can be used
- for effective representation of any desired classification
- scheme. The developers noted that the import features of
- the MaxThink outliners would make it quite efficient to
- import ASCII files carrying the information for either a
- flat or a hierarchical classification scheme. This ease of
- importing classification information sidesteps a major
- obstacle in other systems. Transfer of existing taxonomies
- has been a major problem in previous efforts (Björklund
- 1990b) <refs 3 4>. This item was rated as being easily
- achievable.
-
- Items B.2.a.iii and B.2.a.iv concerned literary
- warrant classifications and faceted classifications. As
- with the preceding item, the principals felt that these
- classifications could be easily represented with a
- hypertext taxonomy. Again, an ASCII format base file
- structure could easily be transported into the MaxThink
- system representation. These items were rated as easily
- achievable.
-
- Major category B.2.b. covered the general indexing
- types. The first of these, Item B.2.b.i. was the
- alphabetical index. This was implemented in the form of
- the alphabetized substantive element listing for the
- Glossary (TM) KWOC index. Both interviewees agreed it
- would be a simple operation to represent standard
- alphabetical indexes, and to embed hypertext links to the
- source document texts. They also agreed that it would be
- more efficient to use external indexing software to create
- such index listings, than to attempt manual indexing. They
- advised the use of specialized indexing packages. This
- item was rated as easily achievable.
-
- Item B.2.b.i.A. referred to the selection or
- assignment of keywords for classification or indexing
- purposes. The MaxThink system provides authoring utilities
- for simple term extraction from source documents. They
- plan to develop more sophisticated term extraction
- utilities, and have initially examined material covering such
- approaches (Pao 1978 <refs -pao>; Tenopir 1990
- <refs -tenopir>). The Glossary (TM) KWOC system effectively
- accomplishes simple term extraction from titles and taxonomy
- content descriptions, using the stylized KWOC format. This
- item was rated as present.
-
- Items B.2.b.i.B. and B.2.b.i.C. covered the use of
- controlled vocabulary term assignment and relative indexing
- methods. The principals felt these to be a matter of
- editorial decision. They stated that the hypertext
- associative linking could easily represent such index
- approaches. They advised use of external software for
- efficient maintenance of such vocabularies or indexes.
- These items were rated as easily achievable.
-
- Item B.2.b.ii. covered the general category of term
- manipulation indexes. This category was rated as present,
- since a KWOC index is provided.
-
- Item B.2.b.ii.A. referred to simple permuted or
- rotated indexes, often found in the general form of Keyword
- in Context (KWIC) indexing. The MaxThink production
- implementation does not presently include this type of
- index, since the designers preferred the KWOC format as
- easier for users. The principals agreed that this type of
- index could effectively be represented in a hypertext
- representation. Again, they recommended use of an external
- program to produce a KWIC index. The item was rated as
- easily achievable.
-
- Item B.2.b.ii.B. represented term manipulation
- indexes, ordered by an extracted term element. This refers
- to the Keyword Out of Context (KWOC) index type, where the
- unrotated term context lines are sorted by the substantive or
- index term, rather than using the rotated line form . KWOC
- indexing is an integral part of the current MaxThink
- production implementation.
- <glossary.txt example of a MaxThink KWOC index>
- KWOC index production is accomplished by a MaxThink utility
- program. This item was therefore rated as present.
-
- The next group of term manipulation category items
- were B.2.b.ii.C. and B.2.b.ii.D. The first item covered
- string indexing, using algorithmic phrase or term
- relationship manipulation. Well-known examples of this
- category include PRECIS and CIFT indexing, respectively
- developed for the British National Bibliography and the
- Modern Language Association. The second item refers to
- chain indexing. In this manipulated index form, the
- constructed index string form reflects the basic embedded
- taxonomy or hierarchy. Cleveland & Cleveland (1990)
- <refs -cleveland> extensively discuss both of these index
- types.
-
- The interviewees agreed that both of these index
- types could be represented in the hypertext presentation.
- They recommended use of external software to create and
- manage these forms of indexes. Both items were therefore
- rated as easily achievable.
-
- The next item, B.2.b.iii., covered the classified
- index form. This index type is arranged in the
- alphanumeric order of a selected, classification code. The
- principals agreed that this was an editorial option, which
- could be successfully implemented in the hypertext format.
- They again noted the ease of import of an ASCII file of a
- classified index table, using the MaxThink outliner
- software. This would mean efficient import and translation
- into a manipulable hypertext taxonomy format. The item was
- graded as easily achievable.
-
- Item B.2.b.iv. covered the category of coordinate
- indexing. This referred retrieval using assigned
- descriptor or index terms, using simple or combined term
- queries. The CD-INDEX utility of the MaxThink authoring
- system (output illustrated in Appendix F) produces
- searchable full text indexes. The search component of the full
- text searching module delivers simple coordinate retrieval
- functionality. This item was rated as present.
-
-
- Item B.2.b.iv.A. represented the category of older,
- non-automated, coordinate searching methods. Some examples
- include edge-notched cards, "peekaboo" punched-hole card
- coordinated systems, and terminal digit coordination.
- These manual methods were judged as inappropriate for an
- automated implementation. The item was therefore rated as
- neither achievable nor applicable.
-
- Item B.2.b.iv.B. represented the database searching
- approach to coordinate retrieval. Although the MaxThink
- systems have string-searching and full text searching
- modules, they do not possess database features. These
- capabilities would include ability for field searching
- specification, searching for field value presence or
- absence, or for combinations of text and field values. The
- interviewees agreed that this ability could be added by
- editorial decision, using link execution of an appropriate
- external program. The item was rated as easily achievable.
-
- Item B.2.b.iv.C. covered full-text searching ability.
- As mentioned, the MaxThink systems offer simple string-
- searching and full text indexed retrieval program modules.
- These modules do not have sophisticated text retrieval
- abilities. Software developer Larson is subjectively
- opposed to dependence upon full text searching techniques,
- pointing to the many studies which demonstrate poor or
- uneven retrieval performance of the approach (Blair and Maron
- 1985 <refs -blair>; 1990 <refs 3 18>). He is therefore
- emphatically committed to the taxonomic and associative
- linking approaches.
-
- However, he has responded to hypertext author and end
- user demands by producing the text searching modules
- mentioned above. The MaxThink software automatically
- generates lists of hypertext links in response to user-
- specified terms. This approach is generally described as
- "dynamic linking" (Frisse 1988) <refs -frisse>. Larson's
- interface predictably uses hypertext link calls to execute the
- search modules.
-
- The searches generate lists of links to both text
- file nodes and hierarchy entries containing the desired
- terms. Use of these dynamically-generated link lists
- combines the specificity of text searching, and also
- retains the guidance value of existing embedded hypertext
- links. The user can use the generated list to make
- hypertext jumps to found items, and can also use the links
- within those items to continue his or her search. Larson's
- decision to also generate links into the hierarchy entries
- means that his text searching module gently guides the user
- back into the sophisticated taxonomy approach. This gives
- the user the benefit of both text searching brute force and
- the structured taxonomy.
-
- The interviewees both agree that their hypertext
- system developers or end users have the additional option
- of using link execution of more sophisticated external text
- searching programs. The item was rated as present.
- Appendix F <app-f> contains multiple screen print
- illustrations demonstrating the text-searching module.
- ***> NOT INCLUDED IN THIS HYPERTEXT VERSION <***
-
- Item B.2.b.v. referred to the faceted indexing
- approach. The interviewees concur that provision of this
- style of access is an editorial option. They note that it
- would be most effective to use external software to create
- a faceted indexing file, and then import it into the
- MaxThink system for translation into the hypertext format.
- This category was therefore rated as easily achievable.
-
- Item B.2.b.vi. refers to the citation indexing
- approach. Both interviewees agree that a citation index
- file could be created using a separate external program,
- and imported into the MaxThink system for translation into
- the hypertext format. The category was rated as easily
- achievable.
-
-
- Access Approaches/Systems Summary
-
- The responses to this section of the interview,
- showed that all but approximately 5% of the traditional
- access approaches or systems can be implemented with or
- through the subject hypertext system.
-
- Figures 7 and 8 graphically illustrate the
- proportional placement of responses. Figure 7 shows that
- 36.8% of the approaches were present in the subject system.
- Many more of the items, some 57.9%, were in the easily
- achieved category. Only 5.3% of the approaches were rated
- in the not possible or practical group. Figure 8 presents
- this information in bar graph form. Seven of the items
- were present, eleven were easily achievable, and one item
- was categorized as not possible or practical.
-
- >>>> FIGURES 7 AND 8 GO HERE
-
- Only about one-third of the items fell into the
- present or implemented category, while more than half were
- rated as easily achieved. This contrasted with the first
- section of the schedule, where approximately two-thirds of
- the items were rated as being present. This pattern switch
- was in great part due to MaxThink developer editorial
- decision. Many access approaches were potentially
- possible, but unimplemented. This was due in large part to
- the principals' emphasis on the creation of taxonomic
- networks as the main tools for access. The interviewees
- frequently expressed their editorial view that many
- traditional information retrieval approaches are difficult
- to understand and use, and therefore tend to be
- inappropriate for novices or infrequent users. They
- emphasized that they have deliberately designed the DaTa
- hypertext to serve this class of user.
-
-
- "Section B.3." Control Mechanisms
-
- Section B.3. <app-b 11 11> of the schedule referred to
- devices provided for the purpose of editorial and quality
- control of an information system. Such devices or mechanisms
- offer control of such areas as taxonomy, vocabulary
- consistency, entry format, syntax, and item filing sequence.
-
- Control Mechanisms Results
-
- Table 3 lists the results of the Control Mechanisms
- section of the study. Out of fifteen items, nine were
- rated as present, five as easily achievable, and one as not
- possible or practical.
-
-
- <TABLE3>
-
- Item B.3.a. referred to the use of a classification
- schedule as the basis for the organization of the
- information system. Most examples of this approach utilize
- a fixed or published classification hierarchy. It was the
- opinion of the MaxThink principals that the flexible and
- adaptive taxonomy networks of their system are equivalent
- to a dynamic classification system. Phillips, the actual
- author of the DaTa product, in particular, habitually
- referred to his working representation of the accounting
- and auditing subject domain area as a "global matrix" or a
- "conceptual matrix." He used the Houdini matrix outliner
- tool to efficiently maintain the maze of interconnected or
- networked area hierarchies. At the time of interview, the
- current global network consisted of approximately two
- hundred interconnected networks. This item was rated as
- present.
-
- Item B.3.b. addressed traditional approaches to the
- maintenance and application of controlled subject
- vocabularies. The interviewees stated that vocabulary
- control was an editorial option, and agreed that it was
- basically necessary for information retrieval system
- quality control. They referred to several of their own
- basic applications of the concept. They use syntax and
- plural policies in building their KWOC index; they are
- developing synonym and thesaurus control utilities; they
- use stopword lists for the KWOC index. The item was rated
- as present.
-
- Item B.3.b.i. specifically addressed the maintenance
- of simple authority or headings files. Both principals
- agreed that this was an editorial decision, for optional
- inclusion. The MaxThink system does not presently include
- any kind of authority maintenance utility. They felt this
- function could be achieved using either manual or external
- software means. The item was rated as easily achievable.
-
- Item B.3.b.ii. referred to thesaurus maintenance.
- This was intended to identify a more sophisticated concept
- control approach than a simple authority list. The
- thesaurus authority file generally shows the full scope of
- term coverage, the relationships of broader terms, narrower
- terms, related items, and guides from synonymous to
- preferred terms (Cleveland & Cleveland 1990) <refs -cleveland>.
- The interviewees do not presently maintain full thesaurus
- control, but felt it was a desirable editorial option. They
- felt this function could be performed externally by using
- either manual or automated thesaurus maintenance. At time of
- writing, Larson had informed the investigator that MaxThink is
- currently developing a thesaurus maintenance program (Neil
- Larson, telephone interview, August 9, 1991). The item was
- rated as easily achievable.
-
- Item B.3.b.iii. covered the use of derived term
- methods. This refers to terms extracted directly from
- source document text, using either manual or automated
- processing approaches. Both agreed that this was an
- editorial option. DaTa author Phillips felt that a domain
- expert would not need such methods to extract document
- concepts; Larson felt that third party or external utility
- software could be useful. However, he is considering
- development of term extraction utilities for quick
- identification of general content. This item was rated as
- easily achievable.
-
- Item B.3.b.iv. referred to use of a hierarchical
- searching thesaurus. This approach is sometimes used in
- full text or bibliographic index database systems. It
- allows a searcher to optionally use hierarchical term
- relationships to aid in term searching. This approach is
- not relevant to the hypertext associative linking approach,
- since it is not a "searching" retrieval approach, and
- cannot utilize this method. Because of this, the item was
- rated as not achievable or applicable. However, the
- interviewees noted that hypertext system authors may use
- link calls to execute an external database program. They
- agreed that hypertext system authors could easily provide
- the searching thesaurus approach by using an external
- program with this facility, such as Zyindex (TM) or
- MicroBASIS (TM).
-
- Item B.3.b.v. covered the generic approach of
- controlling term entry form, for vocabulary consistency.
- The interviewees felt this was an editorial decision, and
- could be achieved using either manual or automated means.
- The item was rated as present.
-
- Item B.3.b.v.A. referred to control of entry syntax,
- such as preference of noun or adjectival form, and entry
- construction approach. This was judged a matter of
- editorial policy. It may be handled manually or with
- external program support. At time of interview, the
- MaxThink DaTa product operation used manual application of
- entry syntax policy. The item was rated as present.
-
- Item B.3.b.v.B. referred to the standardization of
- entry "number," or consistency in singular or plural usage
- form. Again, the interviewees judged this a matter of
- editorial policy decision, which may be handled either
- manually or with external program support. The MaxThink
- DaTa product operation presently uses automatic
- depluralization in the Glossary (TM) KWOC index-building
- program. The item was rated as present.
-
- Item B.3.b.v.C. covered automatic depluralization in
- database searching. This method allows retrieval of either
- singular or plural noun forms, in response to entry of
- either a singular or plural query term. This is not
- relevant to the hypertext retrieval approach. However, the
- interviewees noted this could be achieved by linking to
- external searching software with such capability. The item
- was therefore rated as easily achievable.
-
- Item B.3.b.v.D. addressed the approach of automated
- synonym definition. The KWOC index-building program module
- includes automatic synonym-handling, with cross-reference
- insertion for terms in the main KWOC index listing. The
- hypertext system can also use links to external searching
- programs with synonym definition capability. The item was
- rated as present.
-
- Item B.3.c. referred to the use of standardized
- subdivision or facet identification approach to
- consistently identify document types. This was judged as a
- matter of editorial decision. The DaTa production
- operation uses standard filetype naming conventions and
- special coding to reflect document types. The KWOC index-
- building program uses the codes to group document types
- when sorting the KWOC index entries. This item was rated
- as present.
-
- The approach of term or descriptor relationships was
- covered in Item B.3.d. This referred to techniques using
- term roles or links, or to the weighting of terms in
- database searching. The interviewees observed that this
- text searching methodology was not relevant to the
- hypertext associative linking approach. However, it could
- be achieved by link calls to execute external programs with
- such abilities. The item was therefore rated as easily
- achievable.
-
- Item B.3.e. covered the use of filing or sorting
- rules in building an information system. As an automated
- system, MaxThink presently uses simple ASCII sorting for
- the KWOC index. They do provide for subsorts by document
- type or hypertext node type. This is an editorial
- decision. Other sorting approaches could be incorporated
- by use of appropriate algorithms. The item was rated as
- present.
-
- Item B.3.f. referred to the use of automated
- authority and procedural safety or editorial and quality
- control measures. The MaxThink system uses the Hyperlink
- (1988) collection of separate utility programs for these
- purposes (Fersko-Weiss 1991 <refs -fersko>; Perez 1991
- <refs -perez>; Urr 1991 <refs 23 8>). The utilities perform
- such functions as: checking the spread or clustering of
- associative linking of nodes; checking for blind or erroneous
- link references, correcting link errors; automatic linking to
- files containing defined terms or phrases; and importing of
- ASCII file node names into the matrix outliner (for efficient
- translation into the conceptual taxonomy). These utilities are
- fully described in MaxThink system documentation (TransText
- 1990) <refs -transtext>.
-
- The DaTa production operation additionally employws
- standard computer operating procedures to insure
- information system security. This includes duplicate
- working copies, regular backup of working files, off-site
- storage of copies of CD-ROM masters and tape duplicates,
- and other such methods. Item B.3.f. was therefore rated as
- present.
-
-
- Control Mechanisms Summary
-
- Once again, the great majority of items, 93.3%, are
- rated as present or easily achieved. Figures 9 and 10
- graphically present these results. In this section of the
- study, nine of the items (60%) were present; 5 of the
- items (33.3%) were rated as easily achieved; and one (6.7%)
- was rated as not possible or practical.
-
-
- >>>> FIGURES 9 AND 10 GO HERE
-
- This pattern was similar to the first section, with
- the majority of items falling into the present or
- implemented category. The investigator felt that this
- reflected the principals' ongoing hypertext publishing
- activity. They are regularly producing a large and complex
- hypertext product, and have had to devise effective
- production and editorial control measures.
-
-
- Overall Summary of Study Findings
-
- The study found that the great majority (95.8%) of
- all IAM items were rated as present or easily achieved in
- the subject system. Twenty seven of the items (56.3%) were
- present, nineteen (39.6%) were judged as easily achieved,
- and only two (4.2%) were rated as not possible or
- practical. Figures 11 and 12 graphically present the
- results of totalling items from all sections of the study.
-
- >>>> FIGURES 11 AND 12 GO HERE
-
- The investigator noted that all nineteen of the
- easily achieved items were presented as possible to
- implement through the use of external or third party
- software. He therefore further categorized the purpose or
- functionality of the required external software, into two
- categories.
-
- The first group included software required to
- actually perform a retrieval approach or function. There
- were four of these items. These included Items B.1.j.
- (access by format of document), B.1.k. (access to internal
- document location or position), B.1.l. (specified field
- access), and B.3.d. (term roles, links, or weighting).
-
- The second category was software serving the purpose
- of editorial or authoring process control. This would
- include such functions as thesaurus maintenance,
- maintenance of a classification schedule, production of a
- faceted or string index, etc. The remaining fifteen items
- fell into this category.
-
- The simple listing of these relationships is listed
- in Table 4. The analysis shows that external software
- needs of the subject system are primarily in the area of
- editorial or process controls, rather than in the retrieval
- mechanism or function area.
-
- <TABLE4>
-
- The summary bar chart in Figure 13 presents the
- classification of software needs across the three IAM
- categories. In the Access Systems (Approaches/Methods) and
- the Control Mechanisms categories, the external software
- needs were similarly heavily weighted towards the editorial
- or process controls area. Only the Access Points category
- fell outside this pattern, with all three of the external
- software packages required to perform the missing function.
-
- >>>> FIGURE 13 GOES HERE
-
- This concludes the Results and Discussions of
- Findings chapter. The final chapter will present the
- summary and conclusions of the study, and make
- recommendations for further research. It will also offer
- generalizations from this specific system case study that
- may be applied to the broader hypertext genre.